Search CORE

On the State Complexity of Partial Derivative Automata For Regular Expressions with Intersection

Author: BG Mirkin
E van der Vlist
H Gruber
H Petersen
JA Brzozowski
JM Champarnaud
K Sen
M Fürer
P Caron
P Caron
P Flajolet
S Broda
S Broda
S Broda
S Broda
T Christiansen
T Jiang
V Antimirov
W Gelade
W Gelade
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2016
Field of study

Extended regular expressions (with complement and intersection) are used in many applications due to their succinctness. In particular, regular expressions extended with intersection only (also called semi-extended) can already be exponentially smaller than standard regular expressions or equivalent nondeterministic finite automata (NFA). For practical purposes it is important to study the average behaviour of conversions between these models. In this paper, we focus on the conversion of regular expressions with intersection to nondeterministic finite automata, using partial derivatives and the notion of support. First, we give a tight upper bound of 2O(n) for the worst-case number of states of the resulting partial derivative automaton, where n is the size of the expression. Using the framework of analytic combinatorics, we then establish an upper bound of (1.056 + o(1))n for its asymptotic average-state complexity, which is significantly smaller than the one for the worst case. (c) IFIP International Federation for Information Processing 2016

Repositório Aberto da Universidade do Porto

Regular Expressions and Transducers over Alphabet-invariant and User-defined Labels

Author: A Demaille
A Demaille
BG Mirkin
C Allauzen
HJ Shyr
J Brzozowski
J Sakarovitch
JA Brzozowski
JM Champarnaud
JM Champarnaud
K Thompson
M Veanes
M-P Béal
P Caron
R Bastos
S Broda
S Konstantinidis
S Konstantinidis
S Lombardy
VM Antimirov
Y Sheng
Publication venue
Publication date: 04/05/2018
Field of study

We are interested in regular expressions and transducers that represent word relations in an alphabet-invariant way---for example, the set of all word pairs u,v where v is a prefix of u independently of what the alphabet is. Current software systems of formal language objects do not have a mechanism to define such objects. We define transducers in which transition labels involve what we call set specifications, some of which are alphabet invariant. In fact, we give a more broad definition of automata-type objects, called labelled graphs, where each transition label can be any string, as long as that string represents a subset of a certain monoid. Then, the behaviour of the labelled graph is a subset of that monoid. We do the same for regular expressions. We obtain extensions of a few classic algorithmic constructions on ordinary regular expressions and transducers at the broad level of labelled graphs and in such a way that the computational efficiency of the extended constructions is not sacrificed. For regular expressions with set specs we obtain the corresponding partial derivative automata. For transducers with set specs we obtain further algorithms that can be applied to questions about independent regular languages, in particular the witness version of the independent property satisfaction question

arXiv.org e-Print Archive

Inductive Characterizations of Finite Interval Orders and Semiorders

Author: BG Mirkin
BSW Schröder
D Scott
Jean-Xavier Rampon
Jimmy Leblet
N Wiener
PC Fishburn
PC Fishburn
RD Luce
WT Trotter
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Rapid Pathway Evolution Facilitated by Horizontal Gene Transfers across Prokaryotic Lineages

Author: A Nakabachi
B Snel
BG Mirkin
C Pal
EV Koonin
FD Ciccarelli
G Hernandez-Montes
GR Johnson
GW Tyson
H Ma
H Nishida
Ivan Matic
M Kanehisa
MA Huynen
MT Madigan
N Takezaki
NH Horowitz
RA Jensen
RG Beiko
S Guindon
S Schmidt
SA Teichmann
SC Rison
T Baba
Toshihisa Takagi
V Kunin
W Iwasaki
Wataru Iwasaki
Publication venue: Public Library of Science
Publication date: 01/03/2009
Field of study

The evolutionary history of biological pathways is of general interest, especially in this post-genomic era, because it may provide clues for understanding how complex systems encoded on genomes have been organized. To explain how pathways can evolve de novo, some noteworthy models have been proposed. However, direct reconstruction of pathway evolutionary history both on a genomic scale and at the depth of the tree of life has suffered from artificial effects in estimating the gene content of ancestral species. Recently, we developed an algorithm that effectively reconstructs gene-content evolution without these artificial effects, and we applied it to this problem. The carefully reconstructed history, which was based on the metabolic pathways of 160 prokaryotic species, confirmed that pathways have grown beyond the random acquisition of individual genes. Pathway acquisition took place quickly, probably eliminating the difficulty in holding genes during the course of the pathway evolution. This rapid evolution was due to massive horizontal gene transfers as gene groups, some of which were possibly operon transfers, which would convey existing pathways but not be able to generate novel pathways. To this end, we analyzed how these pathways originally appeared and found that the original acquisition of pathways occurred more contemporaneously than expected across different phylogenetic clades. As a possible model to explain this observation, we propose that novel pathway evolution may be facilitated by bidirectional horizontal gene transfers in prokaryotic communities. Such a model would complement existing pathway evolution models

Exploring Fold Space Preferences of New-born and Ancient Protein Superfamilies

The evolution of proteins is one of the fundamental processes that has delivered the diversity and complexity of life we see around ourselves today. While we tend to define protein evolution in terms of sequence level mutations, insertions and deletions, it is hard to translate these processes to a more complete picture incorporating a polypeptide's structure and function. By considering how protein structures change over time we can gain an entirely new appreciation of their long-term evolutionary dynamics. In this work we seek to identify how populations of proteins at different stages of evolution explore their possible structure space. We use an annotation of superfamily age to this space and explore the relationship between these ages and a diverse set of properties pertaining to a superfamily's sequence, structure and function. We note several marked differences between the populations of newly evolved and ancient structures, such as in their length distributions, secondary structure content and tertiary packing arrangements. In particular, many of these differences suggest a less elaborate structure for newly evolved superfamilies when compared with their ancient counterparts. We show that the structural preferences we report are not a residual effect of a more fundamental relationship with function. Furthermore, we demonstrate the robustness of our results, using significant variation in the algorithm used to estimate the ages. We present these age estimates as a useful tool to analyse protein populations. In particularly, we apply this in a comparison of domains containing greek key or jelly roll motifs

VU Research Portal

Oxford University Research Archive

Comparative Genomics Study of Multi-Drug-Resistance Mechanisms in the Antibiotic-Resistant Streptococcus suis R61 Strain

BACKGROUND: Streptococcus suis infections are a serious problem for both humans and pigs worldwide. The emergence and increasing prevalence of antibiotic-resistant S. suis strains pose significant clinical and societal challenges. RESULTS: In our study, we sequenced one multi-drug-resistant S. suis strain, R61, and one S. suis strain, A7, which is fully sensitive to all tested antibiotics. Comparative genomic analysis revealed that the R61 strain is phylogenetically distinct from other S. suis strains, and the genome of R61 exhibits extreme levels of evolutionary plasticity with high levels of gene gain and loss. Our results indicate that the multi-drug-resistant strain R61 has evolved three main categories of resistance. CONCLUSIONS: Comparative genomic analysis of S. suis strains with diverse drug-resistant phenotypes provided evidence that horizontal gene transfer is an important evolutionary force in shaping the genome of multi-drug-resistant strain R61. In this study, we discovered novel and previously unexamined mutations that are strong candidates for conferring drug resistance. We believe that these mutations will provide crucial clues for designing new drugs against this pathogen. In addition, our work provides a clear demonstration that the use of drugs has driven the emergence of the multi-drug-resistant strain R61

Discovering functional linkages and uncharacterized cellular pathways using phylogenetic profile comparisons: a comprehensive assessment

Author: AC Gavin
AJ Enright
AK Ramani
B Snel
BG Mirkin
C von Mering
C von Mering
CS Goh
CS Goh
D Barker
D Vallenet
E Kolker
EM Marcotte
EM Marcotte
ES Snitkin
F Pazos
F Pazos
G Butland
GV Glazko
H Li
H Rachman
H Tettelin
H Wu
H Wu
HB Fraser
I Lee
I Tirosh
I Uchiyama
J De Las Rivas
J Gertz
J Sun
J Tamames
J Wu
J Wu
JB Pereira-Leal
JC Mellor
JC Rain
JF Rual
JM Peregrin-Alvarez
K Jim
K Tan
L Aravind
L Giot
M Campillos
M Levesque
M Pellegrini
M Strong
M Strong
M Wu
MA Huynen
MG Kann
MJ Martin
ML Green
MY Galperin
N Lopez-Bigas
NJ Krogan
NS Baliga
NS Baliga
P Pagel
P Shannon
P Ternes
P Uetz
PM Bowers
PM Bowers
PM Bowers
R Bonneau
R Jothi
R Jothi
R Overbeek
RA Gutierrez
RA Gutierrez
Raja Jothi
RL Tatusov
SB Hedges
SF Altschul
SV Date
SV Date
T Dandekar
T Gaasterland
T Ito
T Sato
T Wang
T Yamada
Teresa M Przytycka
TF Deluca
TS Mikkelsen
U Stelzl
V Kunin
Y Kim
Y Kim
Y Ye
Y Zheng
Y Zhou
Z Su
ZI Johnson
Publication venue: BioMed Central
Publication date: 01/01/2007
Field of study

Abstract Background A widely-used approach for discovering functional and physical interactions among proteins involves phylogenetic profile comparisons (PPCs). Here, proteins with similar profiles are inferred to be functionally related under the assumption that proteins involved in the same metabolic pathway or cellular system are likely to have been co-inherited during evolution. Results Our experimentation with <it>E. coli </it>and yeast proteins with 16 different carefully composed reference sets of genomes revealed that the phyletic patterns of proteins in prokaryotes alone could be adequate enough to make reasonably accurate functional linkage predictions. A slight improvement in performance is observed on adding few eukaryotes into the reference set, but a noticeable drop-off in performance is observed with increased number of eukaryotes. Inclusion of most parasitic, pathogenic or vertebrate genomes and multiple strains of the same species into the reference set do not necessarily contribute to an improved sensitivity or accuracy. Interestingly, we also found that evolutionary histories of individual pathways have a significant affect on the performance of the PPC approach with respect to a particular reference set. For example, to accurately predict functional links in carbohydrate or lipid metabolism, a reference set solely composed of prokaryotic (or bacterial) genomes performed among the best compared to one composed of genomes from all three super-kingdoms; this is in contrast to predicting functional links in translation for which a reference set composed of prokaryotic (or bacterial) genomes performed the worst. We also demonstrate that the widely used random null model to quantify the statistical significance of profile similarity is incomplete, which could result in an increased number of false-positives. Conclusion Contrary to previous proposals, it is not merely the number of genomes but a careful selection of informative genomes in the reference set that influences the prediction accuracy of the PPC approach. We note that the predictive power of the PPC approach, especially in eukaryotes, is heavily influenced by the primary endosymbiosis and subsequent bacterial contributions. The over-representation of parasitic unicellular eukaryotes and vertebrates additionally make eukaryotes less useful in the reference sets. Reference sets composed of highly non-redundant set of genomes from all three super-kingdoms fare better with pathways showing considerable vertical inheritance and strong conservation (e.g. translation apparatus), while reference sets solely composed of prokaryotic genomes fare better for more variable pathways like carbohydrate metabolism. Differential performance of the PPC approach on various pathways, and a weak positive correlation between functional and profile similarities suggest that caution should be exercised while interpreting functional linkages inferred from genome-wide large-scale profile comparisons using a single reference set.</p

Springer - Publisher Connector